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Abstract 

This  paper  is  a  survey  of  the  field  of  Artificial  Neural  Systems  (ANSs).  ANSs 
have  a  large  number  of  highly  interconnected  processing  elements  that  demonstrate  the 
ability  to  leam  and  generalize  from  presented  patterns.  ANSs  represent  a  possible  solu¬ 
tion  to  previously  difficult  problems  in  areas  such  as  speech  processing  and  natural 
language  understanding.  This  paper  presents  a  brief  history  of  ANSs,  examples  of 
ANS  models  and  areas  where  the  technology  has  been  applied.  Also  discussed  is  the 
connection  between  Artificial  Intelligence  (AT)  and  ANS,  computer  architectures  that 
are  evolving  from  this  field,  and  ANS  algorithms. 
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1.  Introduction 

Speech  processing,  image  processing  and  robotics  are  all  forms  of  pattern  match¬ 
ing.  In  each  area  an  input  is  received  and  matched  to  a  corresponding  output. 
Humans  are  easily  able  to  perform  these  pattern  matching  tasks,  computers,  however, 
are  not.  Computers,  on  the  other  hand,  are  faster  than  humans  at  algorithmic  computa¬ 
tional  tasks.  The  contrast  between  the  processing  abilities  of  computers  and  humans 
arises  because  each  processes  its  information  differently. 

Most  computers  process  information  with  a  single  complex  central  processing  unit 
(CPU).  The  human  brain  processes  information  using  a  large  number  of  simple  pro¬ 
cessing  elements  called  neurons.  To  increase  the  pattern  matching  ability  of  computers 
requires  a  different  approach  to  processing  information  from  the  current  single  proces¬ 
sor  architecture.  Artificial  neural  systems  (ANSs)  are  neurally  inspired  mathemaucal 
models  that  use  a  large  number  of  simple  processing  elements  (PEs).  ANSs  approach 
the  pattern  matching  problem  using  the  same  processing  style  the  brain  uses.  PEs  are 
organized  into  layers  where  each  PE  in  one  layer  has  a  weighted  connection  to  each 
PE  in  the  next  layer.  This  organization  of  PEs  and  weighted  connections  creates  an 
ANS.  An  ANS  learns  patterns  by  adjusting  the  strengths  (weights)  of  the  connections 
between  PEs.  Through  these  adjustments  an  ANS  exhibits  properties  of  generalization 
and  classification  similar  to  humans. 

This  is  a  survey  of  the  field  of  ANS.  Included  is  an  explanation  of  the  biological 
inspirations  and  mathematical  foundations  of  ANS.  The  history  of  ANS  and  applica¬ 
tions  using  ANS  are  presented  as  well  as  a  discussion  of  how  ANS  relates  to  the  field 
of  artificial  intelligence.  At  the  end  of  this  survey  is  an  overview  of  models  and  archi¬ 
tectures  used  in  ANS. 

2.  The  Neuron  and  Neural  Networks 
2.1.  An  Explanation  of  the  Neuron 

The  basic  building  block  of  the  nervous  system  is  the  neuron ,  the  cell  that  han¬ 
dles  intercommunication  of  information  among  the  various  parts  of  the  body  A  neu¬ 
ron  consists  of  a  cell  body  called  a  soma  and  an  axon  or  nerve  fiber  that  connects  the 
cells  to  one  another  (see  figure  ’.).  Junctions  between  neurons  occur  either  on  the  cell 
body  or  on  spinelike  extensions  of  the  cell  body  called  dendrites.  The  junctions  are 
called  synapses.  Nerve  fibers  and  dendrites  can  be  treated  like  insulated  conductors 
for  transmitting  electrical  signals  to  the  neuron  [Lindsay77], 

A  threshold  unit  collects  inputs  and  produces  output  only  if  the  sum  of  the  inputs 
exceeds  an  internal  threshold  value.  The  neuron,  in  its  simplest  form,  can  be  con¬ 
sidered  a  threshold  unit.  As  a  threshold  unit,  the  neuron  collects  signals  at  its  synapses 
and  sums  them  together  using  its  internal  summer  If  the  collected  signal  strength  is 
great  enough  to  exceed  the  threshold,  a  signal  is  sent  out  from  the  neuron  to  the  axons 
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Soma 


Figure  1:  The  neuron.  This  figure  shows  the  neuron  body  (soma) 
and  the  components  of  importance  to  ANS.  The  synapses  are  where 
signals  are  collected  from  the  dendrites.  The  summer  is  where  the 
signals  are  summed.  The  threshold  is  the  internal  value  that  must  be 
exceeded  for  output.  The  axon  is  where  any  output  signals  are  sent. 

2.2.  Neurally  Inspired  and  Mathematically  Supported  Models 

ANS  are  neurally  inspired  models  of  the  mind  [McClelland86,  Myers86, 
Rumelhart86a,  Rumelhart86b].  ANSs  are  not  attempts  at  duplicating  the  mechanisms 
of  the  mind,  only  attempts  at  duplicating  the  functionality  of  the  mind.  Drawing  upon 
an  analogy  from  D.  Rumelhart,  a  leader  in  the  field  of  ANS  from  the  University  of 
California,  San  Diego  (UCSD),  the  brain  is  the  computer  hardware  (mechanisms)  and 
ANS  is  the  computer  software  (functionality).  Extending  the  analogy,  learning  can  be 
considered  programming  the  mind.  The  objective  behind  ANS  is  capturing  the  func¬ 
tionality  of  the  mind. 

Constantly  changing  systems  are  called  dynamical  systems.  Dynamical  systems 
are  described  by  energy  functions  and  probability  distributions.  ANSs  are  supported 
by  the  mathematics  of  such  systems.  A  more  precise  definition  of  ANSs  is  dynamical 
systems  with  adaptive  or  selectable  energy  functions  that  can  carry  out  useful  informa¬ 
tion  processing  by  means  of  initial  response  to  initial  or  continuous  input  [Myers861. 
Rephrased,  ANSs  are  directed  graphs  that  are  able  to  change  when  provided  input. 

2.3.  Constraints  and  Assumptions  About  How  the  Brain  Processes  Information 

Neural  models  are  based  on  how  the  mind  processes  information  information 
[Aman71,  Hecht-Nielsen86a,  McClelland86].  The  performance  of  the  mind  and  the 
performance  of  the  digital  computer  are  compared  in  an  attempt  to  understand  how  the 
mind  is  so  adaptable,  resilient,  and  powerful.  ANS  models  incorporate  information 
about  brain  processing  during  modelling.  The  information  gathered  has  shown  the 
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presence  of  physical  brain  constraints  and  has  led  to  assumptions  about  brain  process¬ 
ing.  The  following  sections  discuss  these  assumptions  and  constraints  and  contrast 
them  with  the  digital  computer. 

2.3.1.  Brain  Speed  V  ersus  Computer  Speed 

Cycle  time  is  the  time  taken  to  process  a  single  piece  of  information  from  input 
to  output.  The  cycle  time  of  the  most  advanced  computers  is  1  nanosecond, 
corresponding  to  one  clock  cycle  for  the  CPU.  The  average  cycle  time  for  a  neuron  in 
the  brain  is  2  milliseconds  [Cottrell84].  The  difference  m  speed  is  5  X  103,  the  com¬ 
puter  is  five- hundred  thousand  times  faster. 

2.3.2.  Parallel  Order  Versus  Serial  Order  and  the  100  Step  Program 

The  most  advanced  computers  are  able  to  process  information  one  million  times 
faster  than  the  brain,  yet  in  some  respects  the  brain  is  superior.  The  difference 
between  the  two  machines  is  the  processing  order.  The  brain  processes  its  information 
in  parallel,  the  computer  processes  its  information  in  serial.  There  is  a  constraint  that 
can  be  extended  from  this  information  called  the  100  step  program  constraint  [Feld- 
man82]:  If  the  mind  reacts  between  1/5  and  1  second  to  a  given  stimulus  (i.e.  answer¬ 
ing  a  true-false  question)  and  the  cycle  time  of  a  neuron  averages  2  milliseconds,  then 
in  the  best  case  in  100  cycle  times  of  a  neuron  a  decision  is  reached.  To  make  a  pro¬ 
gram  that  processes  information  like  the  brain,  that  program  should  not  exceed  100 
steps.  In  contrast  to  large  software  programs  operating  in  serial  on  conventional  com¬ 
puters,  the  mind  operates  with  a  massive  number  of  small  programs  that  execute  in 
parallel. 

2.3 3.  Number  and  Complexity  of  Neurons 

The  number  of  neurons  in  the  brain  is  approximately  10"  with  about  103  to  104 
connections  between  each  neuron.  An  ANS  should  not  simulate  any  more  than  io" 
neurons.  Although  10"  is  admittedly  large,  the  size  is  finite  and  constrained  A  neural 
model  will  require  the  ability  to  handle  large  numbers  of  processing  units.  In  addition 
to  the  large  numbers  of  neurons,  studies  have  also  found  that  the  neuron  is  not  a  sim¬ 
ple  threshold  unit  [Levy821.  The  neuron  is  actually  a  complex  computing  device. 
Recent  studies  have  shown  that  all  the  computing  does  not  take  place  solely  inside  the 
soma:  computations  also  occur  outside  the  neuron  body  in  the  dendrites  and  at  the 
synapses.  These  two  pieces  of  information  remind  ANS  technologists  that  the  brain  is 
a  complex  and  large  device  and  that  models  will  eventually  have  to  represent  such  size 
and  complexity. 
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2.3.4.  Connections  Hold  the  Knowledge 

The  number  of  connections  between  neurons  in  the  brain  is  relatively  fixed.  Very 
few  new  pathways  are  formed  in  adult  brains  and  these  are  assumed  to  be  fot  long¬ 
term  memory  [Feldman82].  Because  of  the  lack  of  new  connections  and  the  time  it 
takes  for  new  connections  to  form,  new  knowledge  is  believed  to  be  captured  by 
changing  the  strengths  of  the  connections  [Cottrell84,  McClelland86].  An  ANS  will  not 
need  to  add  and  remove  connections  to  simulate  the  processing  of  the  brain,  only 
change  the  strength  of  the  connections. 

2.3 J.  Fault-tolerant  Brain  Versus  Fault-intolerant  Computers 

The  brain  is  very  resistant  to  noise  and  rather  robust  in  the  sense  that  damage 
(faults)  to  individual  neurons  does  not  degrade  the  overall  performance  of  the  brain 
[Cottrell 84).  Because  of  this  graceful  degradation,  the  brain  can  be  said  to  be  fault- 
roleranL  Hie  concept  of  fault-tolerance  supports  the  theory  that  the  brain  carries  a  dis¬ 
tributed  representation  of  the  world  in  which  no  one  neuron  carries  a  specific  thought 
or  idea.  Thoughts  and  ideas  are  spread  out  through  many  neurons  and  interconnec¬ 
tions.  Most  computers  are  fault-intolerant.  Each  location  in  computer  memory  holds  a 
specific  piece  of  information.  If  that  memory  location  is  corrupted  then  the  knowledge 
is  lost,  creating  a  fault  ui  the  system. 

2.3.6.  No  Executive  Control  in  the  Brain 

The  brain  does  not  have  any  specific  area  with  executive  control  [CottreU84]. 
Each  neuron  computes  an  output  based  solely  on  its  inputs.  A  neuron  cannot  access 
information  held  by  other  neurons  unless  it  is  directly  connected.  A  neuron  cannot 
look  around  to  see  what  the  other  neurons  arc  doing.  When  designing  an  ANS  only 
the  inputs  to  a  neuron  need  to  be  considered.  There  is  a  sharp  contrast  in  the  com¬ 
parison  of  the  control  in  a  computer  versus  the  control  in  the  mind.  A  computer  uses 
the  CPU  and  the  mind  uses  distributed  control  throughout  the  brain. 

3.  History  of  ANS 

ANS  research  began  in  the  early  1940s.  The  field  is  young  and  many  of  the  peo¬ 
ple  that  were  instrumental  in  its  inception  are  still  very  active  in  the  field  today.  The 
following  sections  discuss  the  people  and  accomplishments  of  ANS  from  1943  to 
present. 

3.1.  McCulloch  and  Pitts  (1943) 

McCulloch  and  Pitts  made  the  first  mathematical  mode)  of  an  ANS  [Hecht- 
Nielsen86a,  Rumelhan86a],  This  model  showed  that  an  ANS  could  compute,  a  theor¬ 
ized  but  previously  unproven  concept.  Although  the  model  was  able  to  compute,  it 
could  not  leam. 
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3.2.  Hebb  (1949) 

D.  Hebb  brought  learning  to  AN'S  technology  [Hecht-Nielsen86a.  McClelland86, 
Rumelhart86a].  Hebb’s  book  Organization  of  Behavior,  published  in  1949,  describes 
a  system  of  correlation  learning  at  the  synapses  of  the  neuron.  From  Hebb’s  observa¬ 
tions  of  how  learning  occurred  in  the  neuron,  he  developed  a  learning  rule  for  ANS, 
the  Hebbian  Learning  Rule.  The  Hebb  Learning  Law  states  "If  neuron  A  repeatedly 
contributes  to  the  firing  of  neuron  B,  then  A’s  efficiency  in  firing  B  increases."  Since 
the  formation  of  Hebb’s  Law  a  restatement  of  the  law  has  emerged  that  says  "  the 
strength  of  the  synaptic  connections  are  changed  in  proportion  to  the  difference 
between  the  target  and  actual  output  of  a  neuron."  [Jorgensen861  To  explain,  if  the 
neuron  has  a  positive  output  (actual)  and  it  is  expected  to  be  negative  (target),  nega¬ 
tively  reinforce  the  synaptic  connections  to  the  neuron.  If  the  neuron  has  a  negative 
actual  output  and  a  positive  target  output,  positively  reinforce  the  synaptic  connec¬ 
tions.  If  the  actual  and  target  outputs  are  the  same,  leave  the  synaptic  connections 
unchanged. 

3.3.  Lasbley  (1950) 

Lashley’s  studies  of  the  mind  led  to  his  insistence  that  the  mind  has  a  distributed 
knowledge  representation  [McClelland86].  Lashley’s  idea  was  that  knowledge  is  not 
locally  stored  but  rather  it  is  stored  in  a  distributed  manner.  Rephrasing  this,  there  are 
no  special  cells  for  special  memories;  rather,  many  cells  carry  a  portion  of  the 
memory. 

3.4.  Edmonds  and  Minsky  (1951) 

D.  Edmonds  and  M.  Minsky  were  the  first  to  build  a  physical  ANS  [Bemsteir.S  1 . 
McClelland86,  Rumelhart86a).  Their  model,  built  at  Harvard  in  the  summer  of  1951 
was  constructed  of  tubes,  motors  and  clutches  The  clutches  were  udiusted  in  actor 
dance  to  the  Hebbian  Learning  Rule  to  store  the  connection  strengths  The  machine 
was  able  to  store  as  many  as  40  patterns  of  40  binary  digits,  hut  was  too  index. hie  tor 
further  work. 

3.5.  Rosenblatt  (1957) 

F.  Rosenblatt  became  a  prominent  ANS  researcher  with  his  creation  of  a  neural 
model  called  the  perceptron  [Hecht-Nielsen86a,  Larson86.  McCIeIland86. 
Rumelhart86a).  The  perceptron  showed  remarkable  promise  as  a  computing  device, 
being  able  to  learn  patterns  and  generalize  from  patterns  learned.  Rosenblatt  studied 
his  model  with  mathematical  analysis  and  digital  computer  simulations  The 
percpetron  brought  many  researchers  to  the  field  of  ANS 
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3.6.  Minsky  and  Papert  (1969) 

M.  Minsky  and  S.  Papert  were  responsible  for  the  demise  of  the  perception  as 
well  as  much  of  the  ANS  research  during  late  1960s  [Hecht-Nielsen86a,  Larson86, 
McClelland86,  Rumelhart86a).  Minsky  and  Papert  were  irritated  at  Rosenblatt  for  over 
claiming  the  perceptron’s  ability  In  their  book  Perceptrons,  published  in  1969,  the 
two  researchers  showed  that  the  perception  was  an  inadequate  model  because  it  could 
not  represent  the  basic  exclusive-or  (XOR)  function.  Minsky  and  Papert  were  so  con¬ 
vincing  that  most  ANS  research  at  the  time  was  halted. 

3.7.  Grossberg  (1968) 

Despite  the  scorching  by  Minsky  and  Papert,  ANS  research  did  continue  on  a 
small  scale.  S.  Grossberg  studied  neurally  inspired  mechanisms  in  both  perception, 
memory,  and  later  vision  [Grossberg68,  Hecht-Nielsen86a,  McCleiland86)  Grossberg’s 
research  has  focused  on  the  mind.,  using  an  ANS  to  model  his  ideas.  Gtossberg's 
mathematical  analysis  of  properties  of  ANS  models  has  led  to  many  insights  that 
include  neurally  inspired  models  of  perception  and  memory.  Grossberg’s  research  has 
recently  been  focused  on  vision.  He  has  just  finished  a  study  that  used  an  ANS  that 
mimics  human  eye  movements  [Grossberg85j. 

3.8.  Willshaw  (1969) 

As  a  member  of  the  research  group  at  Edinburgh  University  under  Longuet- 
Higgins,  D.  Willshaw  made  important  contributions  toward  understanding  memory 
[McClelland86].  Willshaw  did  mathematical  analysis  of  distributed  memory  models 
and  found  properties  associated  with  various  modeling  schemes,  latet  in  collaboration 
with  Longuet-Higgms,  Willshaw  did  work  with  holophones  [Willshaw691  A  holo- 
pnone  is  a  man-made  representation  of  memory  that  is  useful  in  the  analysis  of 
memory  systems. 

3.9.  Amari,  Anderson,  and  Kohonen  (1971) 

Three  researchers  who  did  significant  work  in  1971  are  Amari,  Anderson,  and 
Kohonen.  Amah’s  work  was  with  Boolean  ANS  theory,  an  ANS  that  contains  only 
Boolean  values  [Amari71].  Anderson’s  work  was  with  linear  associative  memory,  a 
memory  that  is  completely  distributed  [McClelland86].  Kohonen’s  research  concerns 
self- organizing  associative  memory,  studying  how  the  mind  organizes  the  information 
n  stores  [Kohonen84j 

3.10.  Rumelhart  and  McClelland  (1977) 

During  the  late  1970s  ANS  technology  became  prominent  in  the  field  of  cognitive 
psychology,  using  ANSs  for  cognitive  models.  Two  cognitive  psychologists  who  ini¬ 
tiated  this  movement  are  D.  Rumelhart  of  UCSD  and  J.  McClelland  of  Carnegie- 
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Mellon  University  (CMU).  McClelland  and  Rumelhart  were  inspired  by  the  HEAR¬ 
SAY  speech  understanding  system  at  Stanford  University  [Hecht-Nielsen86a,  McClel- 
land86].  In  their  efforts  toward  building  a  cognitive  model  for  speech  understanding 
they  rediscovered  ANSs.  Rumelhart  and  McClelland’s  models  are  called  parallel  distri¬ 
buted  processing  (PDP)  models  [McClelland85,  Rumelhan86b].  Many  ANS  learning 
paradigms  have  been  studied  using  PDP,  namely  competitive  learning  [Rumelhart86a], 
Boltzmann  Machines  [Hinton86b,  Rumelhart86d],  and  most  recently  Error  Propagation 
[Rumelhart86c], 

3.11.  Hecht-Nielsen  (1977) 

Commercial  research  is  also  being  conducted  at  TRW.  This  research  was  headed 
by  R.  Hecht-Nielsen  [Hecht-Nielsen86a,  Hecht-Nielsen86b].  This  work  focused  on  the 
application  aspects  rather  than  the  research  aspects  of  ANS  technology.  While  work¬ 
ing  at  TRW,  Hecht-Nielsen  developed  two  neurocomputers,  the  Mark  in  which  is 
commercially  available  [TRW86]  and  the  DARPA-financed  Mark  IV.  In  1986  Hecht- 
Nielsen  left  TRW  and  started  his  own  neurocomputer  company.  His  company  has 
developed  a  neurocomputer  called  the  ANZA  that  fits  on  a  card  and  plugs  into  an  IBM 
PC  AT  [HNC861. 

3.12.  Hopfleld  (1982) 

The  recent  resurgence  of  interest  in  ANS  technology  is  mostly  attributed  to  J. 
Hopfield  of  the  California  Institute  of  Technology  (CalTech)  [Hecht-Nielsen86b]. 
Hopfield  delivered  a  paper  to  the  National  Academy  of  Science  in  1982  that  proved 
that  an  ANS  of  interconnected  processing  elements  would  seek  an  energy  minima 
[Hopfield82].  This  paper  showed  that  ANSs  have  emergent  collective  computational 
abilities;  restated,  as  ANSs  compute  emerging  properties  are  found.  The  emergent  col¬ 
lective  properties  that  are  found  using  an  ANS  appealed  to  a  wide  range  of  disciplines, 
most  notably  physics,  computer  science,  cognitive  psychology  and  neuroscience  [Lar- 
son86,  Myers86],  Since  publishing  the  paper,  Hopfield  has  continued  to  study  the  neu- 
robiological  aspects  of  ANS  [Hopfield84], 

3.13.  Cooper  and  Elbaum  (1983) 

Former  Brown  University  Physicists  L.  Cooper,  a  Nobel  laureate,  and  C.  Elbaum 
started  their  own  company  called  Nestor  Inc.  in  1983  [Larson86,  Nestor86],  Cooper 
and  Elbaum,  like  Hecht-Nielsen,  are  also  interested  in  the  commercial  applications  of 
ANS.  Nestor’s  commercial  projects  include  hand-written  computer  input  systems 
[Nestor86,  Reilly82],  speech  recognition  [Epstein86],  and  3-dimensional  graphics 
[Rimey86]. 
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3.14.  Kosko  ( 1985) 

B.  Kosko  of  Verac  Corporation  has  done  research  with  fuzzy  logic  that  has  over¬ 
lapped  into  ANS  [Chester86,  Kosko86],  Fuzzy  logic  is  the  representation  of  unclear 
and  non-specific  information  (fuzzy  data)  Kosko  has  worked  out  a  wav  to  integrate 
AN'S  and  fuzzy  logic  using  a  system  called  fuzzy  cognitive  maps  One  application 
using  this  melding  of  fuzzy  logic  and  ANS  has  been  in  radar  image  processing 

4.  The  -VLANS  Connection 

The  work  being  done  in  artificial  intelligence  (AI)  and  ANS  overlaps.  In  some 
instances.  AI  and  ANS  have  the  same  goals;  in  others  they  do  not.  In  this  section  a 
discussion  of  where  the  two  fields  overlap  and  diverge  is  presented  as  well  as  a  discus¬ 
sion  of  how  the  two  technologies  could  be  melded. 

ANS  technology  is  aimed  at  developing  human-made  systems  that  can  perform 
the  same  tvpe  of  mformanon  processing  that  the  brain  performs  ANS  developments 
include  real-time  performance  in  pattern  recognition  (speech  recognmon),  knowledge 
processing  given  inexact  and  incomplete  knowledge  (image  processing),  and  precise 
control  in  mulople  constraint  environments  (robotics)  fHecht-Nielsen86b,  Larson86). 
ANS  technology  is  so  different  from  conventional  computer  technologies  that  it  is 
necessary  to  create  a  new  architecture  to  support  it.  ANS  processors  (neurocomputers) 
are  the  new  architectures  that  have  been  produced  to  accomodate  ANS  technology. 

There  are  areas  in  AI  and  ANS  that  have  the  same  goals.  Over  the  past  30  years, 
the  AI  community  has  studied  the  areas  where  ANS  technology  is  currently  being 
applied.  The  areas  of  speech  recognition,  image  processing,  and  robotics  have  been 
assessed  to  be  difficult  areas  in  Ai  ihat  yield  slow  progress  [He.cht-N>eisen86b)  Areas 
where  AI  machines  and  conventional  computers  are  superior  to  ANS  are  algorithmic, 
logic,  and  symbolic  processing. 

ANS  technology  is  aimed  toward  the  difficult  areas  of  AI.  AI  computers  arc  not 
well  suited  for  adapting  and  generalizing,  but  this  is  an  area  where  ANS  performs 
well.  .Areas  such  as  expert  systems  and  symbolic  computing  are  better  suited  for  AI 
LISP-onented  machines.  It  is  not  the  intent  of  ANS  technology  to  replace  AI  technol¬ 
ogy  These  two  technologies  are  able  to  profit  the  greatest  by  melding  themselves  into 
one  machine.  By  installing  an  ANS  processor  into  an  AI  machine,  and  having  the  AI 
machine  call  upon  the  ANS  processor  when  needed,  a  mutual  environment  with 
.mproved  performance,  is  creased  The  ANS  processor  is  used  where  it  is  Pest  suited, 
as  a  specialized  support  subsystem  for  an  A I  computing  system 

5  ANS  Applications 

ANS  has  been  applied  in  a  vanetv  of  areas  and  many  more  ire  vet  to  v 
discovered  [Port87],  ANS  is  good  at  specific  tasks  One  task  ANS  is  able  to  perform 
is  as  an  associative  memory  An  associative  memory  processes  all  possible  outputs  tor 
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a  given  input  simultaneously,  eventually  finding  the  proper  output  in  constant  time 
[Kohonen84],  Another  task  ANS  is  able  to  perform  is  creating  generalized  representa¬ 
tions  of  presented  input  [Epstein86,  Larson86].  Extending  this  idea,  ANS  is  able  to 
process  information  given  only  a  portion  of  the  input.  This  is  a  useful  feature  in  that 
ANS  does  not  need  to  be  supplied  all  the  input  information  to  get  the  proper  output 
[Larson86,  McClelland86].  Similarly,  if  partially  incorrect  or  unclear  (fuzzy)  informa¬ 
tion  is  given  to  an  ANS  it  will  make  a  "best  guess,"  processing  an  output  from  the 
given  input  based  upon  the  generalized  internal  representation  [Chester86,  Kosko86]. 
Yet  another  task  where  ANS  is  competent  is  multiple  simultaneous  constraint  prob¬ 
lems.  ANS  is  able  to  process  many  inputs  simultaneously  and  produce  an  output 
based  on  those  inputs.  Because  of  the  ability  to  process  a  large  amount  of  information 
simultaneously,  robotics  is  an  area  that  is  well  suited  for  ANS  [Myers86,  Nestor86]. 

The  tasks  that  are  performed  well  using  ANS  technology  are  being  applied  in 
many  areas.  In  the  following  three  sections,  examples  of  ANS  applications  are 
presented.  The  first  section  discusses  the  NETtalk  text-to-speech  convenor  built  by 
Sejnowski  and  Rosenberg,  the  second  presents  a  cognitive  psychology  experiment  in 
natural  language  processing  performed  by  Rumelhart  and  McClelland,  and  the  final 
section  briefly  mentions  other  areas  where  ANS  has  been  applied. 

5.1.  Sejnowski/Rosenberg’s  NETtalk 

T.  Sejnowski  of  Johns  Hopkins  University  (JHU)  and  C.  Rosenberg  of  Princeton 
University  (PU)  constructed  an  ANS  application  that  did  text-to-speech  conversion 
[Sejnowski86]  A  phoneme  is  a  symbolic  representation  of  a  single  syllable  sound 
utterance.  Sejnowsld’s  ANS  maps  text  (input)  to  phonemes  (output).  The  phonemes 
that  are  output  from  the  ANS  are  in  turn  passed  to  a  phoneme-to-speech  synthesizing 
device  called  the  DECtalk.  The  ANS  learned  to  read  text  aloud  by  successively  being 
presented  a  window  of  seven  letters  from  a  text  (corpus)  and  using  the  middle  letter  as 
the  target  that  the  single  phoneme  output  would  represent.  By  mapping  seven  letters  to 
a  single  phoneme,  the  ANS  was  incorporating  context  into  the  conversion  process. 
The  middle  letter  of  the  seven  is  the  letter  that  the  phoneme  represents  and  the  three 
preceding  and  three  following  letters  are  context  for  the  letter  being  represented.  After 
the  seven  letters  are  presented,  a  one-letter  shift  of  the  text  is  done  and  the  process  is 
repeated.  This  process  is  continued  for  the  whole  corpus  being  presented.  The  ANS 
model  used  the  Rumelhart/Williams  Error  Propagation  algorithm  (discussed  in  detail 
later)  for  doing  the  mapping  from  input  to  output. 

The  ANS  begins  in  an  untrained  state  with  random  connection  strengths.  After  a 
short  training  period,  the  output  from  the  DECtalk  begins  to  make  a  continuous  and 
eery  babbling.  At  this  stage  of  the  training,  all  speech  is  connected  and  only  one  ver¬ 
bal  sound  is  heard.  Separation  of  sounds  occurs  and  more  than  one  verbal  sound  is 
heard  as  the  ANS  continues  to  learn.  At  this  stage,  the  output  begins  to  sound  like  an 
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infant.  As  the  training  continues  further,  the  output  from  the  DECtalk  begins  to  sound 
like  a  young  child  talking  and  words  are  clearly  distinguishable. 

The  NET  talk  ANS  has  captured  a  large  number  of  the  rules  necessary  for  speech 
synthesis,  for  example,  properly  pronouncing  the  "a"  in  both  "say"  and  "ran  "  The 
same  results  are  possible  from  commercial  text-to-speech  systems,  but  it  has  taken 
years  of  development  and  study  to  leam  the  same  rules  that  the  NET talk  ANS  learned 
autonomously  overnight.  The  development  time  for  the  NET  talk  ANS  was  only  three 
months,  significantly  less  time  than  its  commercial  counterparts. 

5.2.  Rumelhart/McCIelland’s  Natural  Language  Processing 

Two  noted  cognitive  scientists,  D.  Rumelhart  of  UCSD  and  J.  McClelland  of 
CMU,  have  created  an  ANS  that  leams  the  past  tense  of  English  verbs 
[Rumelhart86d],  The  objective  of  the  study  was  to  determine  if  an  ANS  model 
acquired  the  rules  for  forming  the  past  tense  of  verbs  the  same  as  children.  An  ANS 
was  designed  that  took  as  input  a  phonemic  representation  of  a  root  verb  and  gave  as 
output  a  phonemic  representation  of  the  past  tense  of  that  root  verb.  The  ANS  operated 
as  follows:  (1)  The  root  form  of  the  verb  was  presented  as  input,  (2)  the  past  tense  of 
the  root  form  was  the  targeted  output,  and  (3)  errors  between  the  root  and  past  tense  of 
the  verb  were  corrected  using  the  Boltzmann  Machine  Learning  Rule  (discussed  in 
detail  later). 

A  child  progresses  through  three  stages  when  acquiring  the  rules  for  forming  the 
past  tense  of  English  verbs  [Brown73,  Ervin64]:  In  stage  one  there  is  no  evidence  of 
any  rules  being  formed  by  the  child.  Stage  two  shows  an  implicit  knowledge  of 
linguistic  rules;  an  ability  to  apply  the  rules  to  both  nonsense  and  real  words  is  noticed 
as  well  as  over-regularization  of  verbs,  for  example,  regularizing  the  verb  "come"  to 
"corned."  In  the  final  stage  both  the  regular  and  irregular  forms  of  the  verbs  exist;  the 
child  has  learned  both  the  rules  and  the  exceptions. 

Tests  were  conducted  at  various  stages  throughout  the  training  of  the  ANS.  With 
a  limited  amount  of  training,  the  ANS  exhibited  stage  one  results,  showing  no  rule  for¬ 
mations.  As  the  training  continued,  stage  two  results  were  seen.  At  this  poini  in  the 
training,  the  ANS  exhibited  the  same  mistakes  that  children  exhibited,  regularizing  all 
forms  of  the  root  verb.  Finally  after  an  abundance  of  training,  stage  three  results  were 
achieved.  The  ANS  had  learned  both  the  roles  for  forming  the  past  tense  of  root  verbs 
and  it  had  learned  the  exceptions  to  the  rule. 

Another  result  of  the  training  was  that  the  ANS  was  able  to  respond  to  verbs  it 
had  never  seen  before.  This  result  showed  that  an  ANS  was  able  to  abstract  from  what 
it  had  learned,  applying  its  knowledge  to  unknown  root  verb  forms  and  forming  the 
proper  past  tense.  In  summary,  this  study  has  shown  that  an  ANS  is  able  to  learn  and 
generalize  from  information  given,  as  well  as  apply  itself  to  information  previously 
unknown.  This  study  shows  promise  in  real-world  situations  that  involve  inexact  data 
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and  where  a  "best  guess'"  is  sufficient.  Current  computer  software  does  not  handle 
inexact  data  well;  it  is  fault-intolerant  and  requires  specific  input.  Alternatively,  ANS 
has  the  intrinsic  property  of  being  fault-tolerant  and  able  to  handle  unspecific  data 
gracefully. 

5.3.  Other  Application  Areas 

Natural  language  processing  (NLP)  as  an  ANS  application  has  been  explored  by 
G.  Cottrell  of  UCSD  and  S.  Small  of  the  University  of  Rochester  (UR)  [Cottrell84], 
Cottrell  and  Small  built  an  ANS  that  did  word  sense  discrimination,  teaching  an  ANS 
to  determine  which  sense  of  a  word  is  correct  from  the  context  An  example  is  that  the 
ANS  can  understand  the  difference  in  meaning  that  the  word  "threw'"  conveys  in  the 
sentences  "Bob  threw  a  fight"  and  "Bob  threw  a  ball."  Other  work  in  NLP  that 
involves  ANS  has  been  done  by  M.  Fanty  at  UR  [Fanty85].  Fanty  developed  an  algo¬ 
rithm  that  constructs  an  ANS  for  a  context-free  grammar  and  uses  the  created  ANS  as 
a  parser. 

The  ANS  approach  has  been  well  received  in  the  area  of  image  processing.  One 
example  is  the  Mingolla/Grossberg  Vision  Processing  Network,  which  has  demon¬ 
strated  that  template-driven  image  segmentation  and  shift/scale/rotation  invariant  image 
pattern  recognition  are  possible  using  an  ANS  [Hecht-Nielsen86b]. 

In  another  image  processing  application  using  ANS,  N.  Farhat  of  the  University 
of  Pennsylvania  (UPenn)  has  trained  an  ANS  to  discern  radar  images  of  various  air¬ 
craft  [Larson86].  Results  of  the  application  have  shown  recognition  of  a  bomber  with 
only  20  percent  of  the  image  supplied. 

Nestor  Inc.  has  developed  an  application  of  ANS  that  accepts  writing  on  a  digi¬ 
tized  pad  as  computer  input  [Nestor86].  The  ANS  learns  the  idiosyncracies  of 
anyone’s  handwriting,  allowing  a  more  direct  input  to  the  computer.  The  company’s 
focus  is  on  an  ANS  that  takes  kanji  (Japanese  lettering)  as  input  and  converting  it  to  a 
computer  character  form,  thus  eliminating  the  difficult  task  of  computer  entry. 

Current  applications  of  ANS  in  speech  research  has  been  toward  discovering  a 
method  for  speaker-independent  recognition.  J.  Elman  and  D.  Zipser  have  used  ANS 
in  an  attempt  to  discover  the  hidden  features  in  speech  that  allows  humans  to  distin¬ 
guish  words  [EIman87],  Another  speech-related  project  is  to  use  spatiotemporal  pat¬ 
tern  matching  to  achieve  speaker- independent  recognition  [Hecht-Nielsen86c]. 

6.  ANS  Models 

There  are  many  different  ANS  models.  This  paper  will  present  three  of  the  pre¬ 
valent  models  in  ANS  technology.  The  first  model  is  the  Hebb/Hopfield  model 
[Hopfield82,  Jorgensen86],  the  second  is  the  Boltzmann  Machine  model  [Hinton86b. 
Rumelhart86d],  and  the  last  is  the  Error  Propagation  model  [Hecht-Nielsen86a, 
Rumelhart86c).  Each  model  is  successively  more  complex  and  more  successful.  Each 
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model  has  processing  elements  (PEs)  with  adjustable  strength  connections  from  other 
PEs.  Each  successive  model  varies  in  how  the  connection  strengths  are  adjusted  and 
how  the  connections  and  PEs  are  arranged  into  an  ANS. 

The  following  section  explains  the  PE  and  its  similarities  to  the  neuron.  Follow¬ 
ing  the  explanation  of  the  PE  are  three  sections  dedicated  to  explaining  each  of  the 
fotementioned  ANS  models. 

6.1.  The  Processing  Element 

The  PE  (see  figure  2)  consists  of  weighted  input  connections  (*o,...,wN),  a  summa¬ 
tion  function,  a  threshold  function,  and  an  output  value  [Hecht-Nielsen86a,  McClel- 
land86,  Rumelhan86b). 


Figure  2:  An  ANS  PE.  The  inputs  are  the  weighted  connections 

from  the  N  elements  (w0 . wN)  of  the  layer  below  to  the  j*  PE. 

The  weights  are  added  together  using  a  summation  function  that  pro¬ 
duces  the  value  NEJr  The  output  of  the  summation  function  is 
passed  through  a  threshold  function  f(NETt)  The  output  of  the 
threshold  function  is  the  net  value  of  the  PE  OUT , . 

Each  of  the  components  of  this  PE  corresponds  to  a  component  of  the  neuron  dis¬ 
cussed  in  section  2  (Jorgensen86).  The  correspondence  is  summarized  in  table  1. 
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Table  1:  Comparison  of  components  of  a  PE  and  a  neuron. 


Correspondence  Between  Neuron  and  PE 

Neuron 

PE 

Synapses 

Summer 

Threshold 

Axon 

Weights 

Summation  Function 

Threshold  Function 
Net  Output 

By  combining  many  of  these  PF.s,  we  form  an  ANS.  The  ANS  is  constructed  in 
layers.  The  inputs  from  the  environment  (external  to  the  ANS)  enter  the  input  layer  of 
the  ANS.  The  outputs  enter  the  environment  from  the  output  layer  of  the  ANS.  Any 
layers  that  exist  between  the  input  and  output  layers  are  called  hidden  layers.  Hidden 
layers  are  not  accessible  from  the  environment;  they  are  only  accessible  by  the  ANS. 
The  connections  are  made  from  input  toward  output,  but  not  from  output  to  input. 
Using  this  definition,  an  ANS  can  be  considered  a  hierarchical  directed  graph  that  only 
allows  the  flow  of  information  from  parent  (input)  to  child  (output).  Connections  in  an 
ANS  are  made  from  every  PE  on  the  parent  level  to  every  PE  on  its  child  level,  creat¬ 
ing  a  completely  interconnected  ANS.  Information  is  stored  (learning)  by  adjusting 
the  weights  (connections  strengths)  between  PEs.  Figure  3  shows  a  completely  inter¬ 
connected  ANS  that  has  5  PEs  on  the  input  layer  (i/i0,...,in4)  and  5  PEs  on  the  output 
layer  (ouio,...,out4). 

6.2.  The  Hebb/Hopfield  Model 

ANSs  are  dynamical  (constandy  changing)  systems.  Energy  functions  and  proba¬ 
bility  relationships  are  used  to  mathematically  describe  and  model  ANSs  because  of 
their  dynamic  nature.  An  ANS  is  a  mathematical  model  in  an  N-dimension  energy 
space,  where  N  is  the  number  of  connections  in  the  most  interconnected  PE.  The 
Hebb  and  Hopfield  models  describe  an  energy  space  that  seeks  a  local  energy  minima 
from  the  point  of  entry  [Hopfield82,  Jorgcnsen86].  Once  the  system  is  given  a  specific 
input  pattern  (entry  point  into  the  energy  space),  the  system  will  compute  the  output 
pattern  (seek  its  local  minima). 

An  ANS  must  be  able  to  adjust  itself  to  seek  the  proper  energy  minima  upon 
entry.  The  adjustments  of  the  weights  (connection  strengths)  in  an  ANS  create  the 
proper  energy  minima  for  a  given  input.  The  Hebb  and  Hopfield  models  are  different 
in  respect  to  the  methodology  used  to  adjust  the  weights.  The  Hebb  model  uses  an 
algorithm  that  increases  the  strength  of  connections  between  PEs  that  are  both  positive, 
decreases  the  strength  of  PEs  that  are  both  negative,  and  leaves  the  connections 
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Figure  3:  A  two  layer  ANS.  A  two-layer  ANS  with  5  PEs  on  the 
input  layer  (in0...  jnJ  and  5  PEs  on  the  output  layer  (out0„  ,out*j. 

Each  PE  on  the  input  layer  is  connected  to  a  PE  from  the  output 
layer,  creating  a  completely  interconnected  ANS. 

between  positive/negative  PEs  unchanged.  The  Hopfieid  model  uses  an  algorithm  tha 
assigns  a  positive  value  to  connections  between  two  PEs  that  are  either  both  positivt 
or  negative,  and  assigns  a  negative  value  to  PEs  that  have  mismatched 
(positive/negative)  values.  With  the  exception  of  how  the  weights  are  adjusted,  the 
Hebb  and  Hopfieid  models  are  the  same. 

One  use  of  the  Hebb/Hopfield  models  is  as  an  associative  memory  (discussed  in 
section  3).  An  associative  memory  is  able  to  reconstruct  a  complete  pattern  from  only 
a  portion  [Kohonen84].  An  example  of  the  actions  of  such  an  ANS  is  to  store  (adjust 
the  weights  in  the  ANS)  binary  value  "101."  When  the  ANS  is  presented  with  the 
incomplete  input  "1?1"  (where  the  1  means  unknown)  it  will  reconstruct  the  missing 
information  and  output  "101."  The  Hebb  and  Hopfieid  models  that  accomplish  this  are 
described  in  appendix  1. 
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6.3.  The  Boltzmann  Machine  Model 

The  Boltzmann  Machine  model  is  an  ANS  that  uses  the  Boltzmann  probability 
distribution  function  to  adjust  connection  strengths.  The  Boltzmann  distribution  func¬ 
tion  is  a  ratio  of  probabilities: 


where  Pa  is  the  probability  of  being  in  energy  state  a,  P9  is  the  probability  of  being  in 
energy  state  (3,  Ea  is  the  energy  of  state  a,  is  the  energy  of  state  |3,  and  T  is  the  tem¬ 
perature  of  the  energy  system  [Jorgensen86,  Rumelhart86d]. 

The  probability  of  being  in  energy  state  a  or  energy  state  (3  is  equal  to  1  (i.e.  Pa  + 
=  1).  In  the  ANS  form  of  the  distribution  equation,  the  associated  energy  of  state  (3 
is  assigned  to  the  variable  9.  Solving  for  the  probability  of  being  in  state  a  results  in 
the  following: 


a  “  -(£„-«) 

1+e  r 

The  ANS  equivalent  for  the  probability  of  being  in  state  o  of  the  above  equation 
is  the  output  value  desired  of  the  j*  PE,  where  j  is  one  of  the  N  output  PEs  in  the 
output  layer  of  a  two-layer  ANS.  This  value  is  mathematically  referred  to  as  out,,  the 
j*  clement  of  the  desired  output  vector  out.  The  ANS  equivalent  of  the  energy  state 
of  a  for  the  above  equation  is  net,,  the  computed  output  for  the  j*  PE  of  the  output 
layer.  This  value  can  be  computed  by  multiplying  each  connection  (j )  into  the  j  *  PE 
by  the  corresponding  output  value  from  in,,  the  i*  PE  of  the  input  layer.  The  equation 
for  computing  net,  is  as  follows: 

net,  =  Wj, 

I 

In  this  equation  w,,  is  the  connection  strength  (weight)  of  the  connection  from  input  PE 
<  to  output  PE  ].  Using  the  ANS  equivalence,  the  ANS  form  of  the  Boltzmann  distri¬ 
bution  function  is  as  follows: 

°“'s — 

1  +*  r 

The  value  of  9,  is  a  bias  that  is  associated  with  each  PE.  The  values  this  func¬ 
tion  produces  are  between  0  and  1.  Plotting  the  output  of  this  function  against  the 
probability  that  it  is  on  (p(on))  produces  the  plot  shown  in  figure  4.  Because  the  curve 
has  a  'S"  (sigmoid)  shape  and  is  bounded  between  0  and  1,  the  function  can  be  con¬ 
sidered  a  threshold  function.  A  threshold  function  has  an  output  of  I  when  it  is  firing 
and  0  when  it  is  not. 
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Figure  4:  The  threshold  funetkm  carve.  The  threshold  function 
used  to  calculate  the  probability  of  firing.  The  x-axis  shows  values 
of  net,  -  0,/T  and  the  y~axu  indicates  the  corresponding  probabili¬ 
ty  of  the  PE  being  on  (p  (on )). 

The  Boltzmann  Machine  ANS  also  uses  a  different  process  for  learning,  applying 
the  concepts  of  simulated  annealing  to  the  learning  process.  By  using  the  Boltzmann 
Machine  equation  above  as  a  threshold  evaluation  function  for  each  PE,  and  by  regu¬ 
lating  the  temperature  value  T,  the  ANS  can  learn  more  effectively.  Creating  an  ANS 
creates  an  energy  terrain.  By  adjusting  the  weights,  the  ANS  energy  terrain  is 
changed.  The  best  energy  terrain  is  one  that  will  have  a  deep  energy  well  for  each 
entry  point  Sometimes  the  entry  point  into  the  ANS  is  not  at  a  good  location;  thus 
the  energy  minima  that  is  needed  is  not  available.  By  adding  energy  to  the  ANS, 
called  adding  noise,  the  energy  well  has  a  better  chance  of  being  found  from  the  entry 
point.  Restated,  when  an  ANS  is  started  at  an  entry  point  in  the  energy  terrain,  the 
local  minima  will  be  sought  What  is  wanted  is  not  the  local  energy  minima  but  the 
global  energy  minima.  By  adding  noise  to  the  energy  terrain,  it  is  possible  to  bounce 
out  of  the  the  local  minimas  and  eventually  find  the  global  minima.  Noise  is  added  to 
the  system  by  increasing  the  temperature  T ,  and  by  slowly  dropping  the  temperature 
(reducing  the  amount  of  noise)  the  ANS  is  simulating  annealing 

In  contrast  to  the  Boltzmann  Machine  model,  the  Hebb/Hopfield  models  are 
confined  to  there  point  of  entry  and  are  easily  trapped  in  a  local  minima.  The 
Boltzmann  Machine  model  is  able  to  overcome  that  problem  and  find  the  global 
minima.  Appendix  2  gives  an  algorithm  for  mapping  patterns  of  input  vectors  to 
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output  vectors  using  this  model. 

6.4.  The  Rumelhart/Wiiliatm  Error  Propagation  Model 

The  models  presented  up  to  this  point,  the  Hebb/Hopfield  and  Boltzmann 
Machine  models,  are  two-layer  ANSs.  There  are  many  problems  in  the  real  world  that 
cannot  be  represented  in  a  two  layer  system.  One  is  the  exclusive-or  (XOR)  function 
[Rumelhart86c].  Because  there  exist  no  values  that  the  connection  strengths  can 
assume  that  will  give  the  appropriate  output  for  all  inputs,  the  two-layer  ANS  is  inade¬ 
quate.  Table  2  describes  the  XOR  function,  showing  the  input  values  m0  and  ml  and 
the  corresponding  output  value  out0. 

Table  2:  The  XOR  function.  The  input  values  inQ  and  int  and 
corresponding  output  values  out0  can  not  be  represented  in  a  two 
layer  ANS.  Using  a  three  layer  ANS  this  mapping  is  possible. 


In  o 

‘n  i 

out  o 

0 

0 

0 

0 

1 

1 

1 

0 

1 

l 

1 

0 

Using  the  two-layer  ANS  shown  in  figure  5,  no  weight  assignments  can  be  made  to 
w oo  and  w0,  that  will  give  a  proper  output  for  each  of  the  four  XOR  inputs  patterns 
shown  in  table  2. 

The  solution  to  this  problem  is  to  introduce  a  third  layer,  called  the  hidden  layer, 
between  the  input  and  output  layers.  The  hidden  layer  creates  the  ability  to  incorporate 
an  internal  representation  that  facilitates  difficult  mappings  between  input  and  output 
patterns.  By  adding  the  middle  layer  shown  in  figure  6  to  the  ANS  shown  in  figure  5, 
the  XOR  function  is  now  representable  [Rumelhart86cj. 

R.  Hecht-Nielsen  has  taken  this  idea  a  step  further  by  applying  a  mathematical 
existence  theorem  to  ANS.  Hecht-Nielsen,  using  the  Kolmogorov  Existence  Theorem 
which  states  that  any  continuous  mapping  can  be  done  in  a  three-layer  system,  has 
shown  that  a  three-layer  ANS  exists  for  any  continuous  mapping.  If  there  is  a  con¬ 
tinuous  mapping  from  input  to  output,  there  exist  a  three-layer  ANS  that  can  represent 
that  mapping  [Hecht-Nielsen86a]. 

D.  Rumelhart  and  R.  Williams  of  UCSD  discovered  an  algorithm  that  could  do 
the  mapping.  The  Rumelhart/Williams  Error  Propagation  algorithm  can  do  the  map¬ 
ping  for  a  three-layer  (or  more)  ANS.  In  the  three-layer  system,  the  weights  are 
adjusted  for  the  output  layer  according  to  an  error  function  that  calculates  a  weight 
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Figure  5:  A  two-layer  ANS  for  the  XOR  function.  The  weights 
wqq  and  *01  cannot  assume  values  that  will  produce  the  proper  out¬ 
put  for  all  four  input  patterns  of  the  XOR  function. 

adjustment  based  upon  the  difference  between  the  target  output  and  the  computed  out¬ 
put  of  each  output  PE.  Each  error  value  for  each  output  PE  is  then  propagated  back¬ 
ward  to  the  hidden-layer  and  used  to  adjust  the  values  of  the  weights  to  the  hidden 
PEs.  The  error  adjustments  for  the  weights  to  the  hidden  layer  PEs  is  calculated  using 
the  derivative  of  the  error  function  used  to  adjust  the  weights  for  the  output-layer  PEs. 
By  using  the  derivative,  the  hidden-layer  PEs’  values  are  properly  adjusted. 

6.5.  Summary 

These  models  represent  an  adequate  cross-section  of  ANS  technology  for  the  pur¬ 
poses  of  this  survey.  In  summary,  each  model  is  successively  more  complex  and  is 
more  able  at  storing  representations.  The  Hebb/Hopfield  models  are  the  least  complex 
and  the  Error  Propagation  model  is  the  most  complex.  The  Error  Propagation  model  is 
the  best  at  storing  representations  and  the  Hebb/Hopfield  models  are  the  most  limited. 
The  Boltzmann  Machine  fails  between  the  the  Hebb/Hopfield  and  the  Error  Propaga 
tion  model  in  both  representation  ability  and  complexity. 

7.  The  N eurocomputers 

One  inexorable  problem  associated  with  implementing  ANS  is  the  massive 
amount  of  computing  that  is  necessary.  As  the  models  get  larger  and  more  complex, 
the  computing  time  becomes  exponentially  larger.  One  solution  to  the  problem  is  to 
build  a  computer  that  is  architecturally  suited  to  handle  ANS.  Because  ANS  is 
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OUT 


Figure  6:  The  three-layer  ANS  for  the  XOR  functioa.  This  ANS 
allows  the  representation  of  the  XOR  function.  The  additun  of  the 
hidden  units  between  the  input  and  output  layer  creates  an  internal 
representation  that  makes  the  difficult  XOR  mapping  possible 

massively  parallel  in  nature,  a  computer  built  with  thousands  of  processors,  where  each 
processor  takes  the  place  of  one  PE,  would  solve  the  problem. 

Many  attempts  are  being  made  at  building  ANS  computers.  The  computers  being 
designed  for  ANS  modeling  are  called  neurocomputers  and  are  currently  being  imple¬ 
mented  in  two  frameworks:  electro-optical  and  electronic  (Hecht-Nielsen86b). 

Electro-optical  computers  are  designed  to  use  light  for  the  connections  between 
PEs.  Because  light  is  able  to  overlap  without  interfering,  it  is  a  good  med:um  for 
implementing  the  high  number  of  connections  needed  between  PEs.  Leaders  in  r‘ 
research  include  C  Guest  of  UCSD  and  B  Kosko  [Kosko87],  D  Psaltis  and  Y.  .,a- 
Mostafa  CalTech  with  N.  Farhat  of  the  University  of  Pennsylvania  [Abu-Mostafab/, 
Brown 86a],  and  Szu  of  Naval  Research  Laboratories  [Brown86b] 

The  other  method  used  to  implement  neurocomputers  is  electronically  Such  neu¬ 
rocomputers  have  all  the  interconnections  hard-wired  and  use  available  transistors  and 
hardware  in  their  implementation.  Size  and  cost  are  no  longer  the  inhibiting  factors 
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:hev  had  been  10  years  ago  because  of  the  dramatic  advancements  in  electronic  circui¬ 
try.  One  leader  in  this  area  is  R.  Hecht-Nielsen,  who  has  started  his  own  company 
that  produces  neurocomputers,  the  Hecht-Nielsen  Neurocomputer  Corporation  (HNC). 
HNC  will  market  a  board  that  fits  into  an  IBM  PC/AT  card  slot  and  can  be  used  to 
implement  many  different  neural  models,  including  those  presented  in  this  paper 
[Brown87,  HNC86].  This  neurocomputer  has  a  capacity  for  30,000  PEs  with  300,000 
interconnections. 

Other  leaders  in  electronic  neurocomputers  include  Nestor  and  TRW.  Nestor  is 
the  only  company  that  has  marketed  an  ANS  application.  Nestor  has  a  patented 
hardware  system  that  allows  handwritten  input  to  a  computer  via  a  digitized  pad  [Nes- 
tor86],  TRW  has  entered  the  ANS  market  place  with  its  Mark  III  neurocomputer 
(TRW861.  designed  by  Hecht-Nielsen  before  he  left  TRW  in  late  1986  and  formed  his 
own  company. 

Neurocomputers  will  not  replace  the  existing  computer  The  neurocomputers 
being  designed  are  subservient  members  of  a  Von  Neumann  computer.  Programs  that 
use  neurocomputers  make  a  subroutine  call  to  the  neurocomputer  to  do  its  specialized 
work.  Neurocomputers  will  have  their  own  software  which  will  be  able  to  integrate 
with  existing  software  to  create  a  machine  with  added  capability  and  potential. 
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Appendix  1 


1.  The  Hebb/Hopfield  Algorithms 

The  following  are  learning  and  recall  algorithms  for  the  Hebb  and  Hopfield  ANS 
models.  These  algorithms  are  designed  to  store  a  single  pattern.  A  pattern  is  a  map¬ 
ping  of  an  input  vector  in  to  an  output  vector  out.  If  the  input  and  output  vectors  are 
the  same,  as  they  are  in  these  models,  the  model  is  acting  as  an  associative  memory. 
Although  these  models  are  designed  to  store  one  binary  valued  vector,  they  can  easily 
be  extended  to  store  several  binary  vectors.  These  models  can  also  be  expanded  to 
store  mappings  between  vectors  differing  in  length,  value,  or  both. 

1.1.  The  Hebb  Model  Learning  Algorithm 

The  Hebb  model  learning  algorithm  is  as  follows: 

1.  Given  an  input  vector  of  length  N  called  in  with  binary  values  from  in0  to 

2.  Construct  a  duplicate  vector  to  the  input  vector,  call  this  vector  out;  it  is  also 
indexed  from  0  to  IV- 1 . 

for  i  -  0  to  (N-I )  do 
out ,  =  in, 

enddo 

3.  Construct  a  weight  matrix  N  x  N  that  is  initialized  to  all  zeroes  called  W. 

for  i  =  0  to  (N-l)  do 

for  j  =  0  to  (N-l)  do 
=0 

enddo 

enddo 

4.  Generate  appropriate  values  for  each  position  in  the  matrix  where  i  accord¬ 
ing  to  their  similarity  as  follows: 

•  If  in,  and  out,  are  both  equal  to  1,  add  strength  to  the  connection  between 
them  (via  an  increase  in  the  value  stored  for  this  connection  in  the  weight 
matrix). 

•  If  in,  and  out,  are  both  equal  to  0,  subtract  strength  from  the  connection 
between  them  (via  a  decrease  in  the  value  stored  for  this  connection  in  the 
weight  matrix). 

•  Otherwise,  continue. 

The  algorithmic  form  is  as  follows: 
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for  i  =  0  to  (N-l)  do 

for  j  =  0  to  (N-l)  do 
if  t  *)  then 

if  i  rti  =  out,  =  1  then 

K,  ««'./+! 

elseif  in,  =  out,  =  0  then 

w„  =  w,r  1 

endif 

endif 

enddo 

enddo 


The  following  example  illustrates  how  this  algorithm  works: 

1 .  Consider  the  five-element  input  vector  in  =011 10. 

2.  Creating  a  duplicate  vector  gives  the  output  vector  out  =  OHIO. 

3.  Creating  a  weight  matrix  W  of  dimensions  5x5  with  each  slot  in  the  vector  ini¬ 
tialized  to  zero  yields  the  following  matrix: 


w  = 


OOOOO' 
00000 
0  000  0 
00000 
.0  00  0  0. 


4.  Generating  the  values  for  W  according  to  the  Hebb  Learning  Rule  results  in  the 
following  computations: 

m0  =  0  and  out,  =  l,  so  W0]  is  unchanged 

m0  =  0  and  out-i  =  1,  so  wn  is  unchanged 

mn  =  0  and  out-i  3  l.  so  w n  is  unchanged 

i/»0  =  0  and  out4  =  0,  so  =  -l 

Continuing  with  this  for  mu  ut2,  and  in2  will  yield  the  adjusted  matrix: 


0  000  -1 
0  0  110 
0  10  10 

0  110  0 

-1000  0  . 


w  = 
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1.2.  The  Hebb  Model  Recall  Algorithm 

The  weight  matrix  should  now  hold  the  vector  OHIO.  To  recall  this  vector  from 
the  matrix,  the  following  algorithm  is  used: 

1.  Given  a  weight  matrix  W  that  has  dimensions  N  x  N  that  is  storing  a  vector  of 
length  N. 

2.  Sum  over  each  row  of  W  (the  equivalent  of  summing  up  all  the  connection 
strengths  entering  output  element  ;);  store  each  sum  in  a  vector  net  at  the  j* 
index. 


for  j  -  0  to  (N-l)  do 
sum  =  0 

for  i  =  0  to  (N-l )  do 

sum  =  sum  +  W,, 

enddo 
=  sum 

enddo 

3.  Test  each  element  of  the  vector  net  and  reset  its  value  as  follows: 

for  j  =  0  to  (N  -  1)  do 
if  net,  >  0  then 
net ,  =  1 

elseif  net ,  <  0  then 
net ,  -  0 

else 

net,  =  1  or  net,  =  0  with  a  probability  of  OJ 

endif 

enddo 

Using  the  W  computed  for  the  vector  in  =  oil  10,  we  get  the  following: 

net0  a0  +  0  +  0  +  0  +  (-1)  =  -l 
««rj*0-*-0+l  +  l+  0*2 
net  2  =  0+  l+  0+  l+  0  =  2 
net  j  ^  0  +  l  +  l+  0  +  0=  2 
net  4  =  (-l)  +  0  +  0  +  0  +  0  =  -l 


3.  From  the  values  computed  above,  we  reset  each  value  of  the  vector  net  as  fol- 
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netQ  <  0,  so  net0  -  0 
net  i  >  0,  so  net  i  =  1 
neta  >  0,  so  net  7  -  1 
net0  >  0,  so  net  q  -  1 
net  a  <  0  .so  net4  =  0 


and  the  vector  recalled  from  the  matrix  is  the  vector  OHIO,  the  same  vector  the 
ANS  was  taught. 

1.3.  The  Hopfield  Model  learning  Algorithm 

In  the  Hopfield  model,  step  4  of  the  Hebb  learning  algorithm  is  changed  to  the 
following: 

4.  Generate  appropriate  values  for  each  position  in  the  matrix  W,,  where  ;  *j  using 
the  following  equation: 

Wti  =  (2 in,  -  \K2out,  -  1) 

This  equation  does  the  following: 

«  If  in,  and  out,  are  both  equal  to  1  or  both  are  equal  to  zero,  store  a  i  at  the 
matrix  position  w,,. 

•  Otherwise,  store  a  -1  at  matrix  position  W,r 
The  algorithm  for  this  step  is  as  follows: 

for  i  =  0  to  (N-I)  do 

for  j  =  0  to  (N-l)  do 
if  i  *j  then 

W,,  =  (2 in,  -  1  ){2out,  -  1) 

endif 

enddo 

enddo 


1.4.  The  Hopfield  Model  Recall  Algorithm 

In  the  Hopfield  model,  step  3  of  the  Hebb  recall  algorithm  is  changed.  When  con¬ 
structing  the  recall  vector  net ,  use  the  following: 

3.  Test  each  element  of  the  vector  net  and  reset  its  value  as  follows: 
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for  j  =  0  to  (N  -  l)  do 
if  net,  >=  0  then 
net ,  =  / 

else 


net,  =  0 

endif 

enddo 


Using  the  same  example  given  before,  given  the  vector  in  =01110,  we  get  the  fol¬ 
lowing  adjusted  matrix  W : 


w  = 


0  -1  -1  -1  1 
-10  1  1-1 
-1101-1 
-ll  1  0  -l 
1  -1  -1  -1  0 
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Appendix  2 

1.  The  Boltzmann  Machine  Model  Algorithms 

The  following  are  the  learning  and  recall  algorithms  for  the  Boltzmann  Machine 
ANS  model.  In  the  learning  stage  this  ANS  model  associates  an  input  vector  with  a 
different  output  vector.  In  the  recall  stage  this  ANS  model  is  given  an  input  vector 
and  the  associated  output  vector  is  recalled.  This  type  of  model  is  considered  a  pattern 
association  model. 

LI.  The  Boltzmann  Machine  Learning  Algorithm 

The  Boltzmann  Machine  Learning  Algorithm  is  implemented  in  this  model  using 
a  tolerance  value  to  test  when  the  model  has  satisfactorily  learned  all  the  patterns  it 
has  been  presented.  This  model  will  continue  to  leam  until  all  the  differences  between 
the  computed  output  value  net ,  and  the  target  value  out ,  are  within  the  specified  toler¬ 
ance  This  model  will  start  with  a  tolerance  of  0.1  and  as  the  annealing  process 
progresses  the  tolerance  value  will  decrease. 

This  model  assigns  a  base  temperature  (T)  of  25  and  a  base  learning  rate  (t))  of 
0  i  at  the  start.  These  two  values  are  related  to  each  other  as  follows:  The  higher  the 
temperature  the  lower  the  learning  rate  needs  to  be  and  vice-versa.  In  this  model  the 
learning  rate  (r|)  increases  by  0.1  each  time  the  temperature  (r)  decreases  by  5  The 
changes  in  T  and  n  occur  each  time  the  tolerance  is  satisfied,  and  each  time  the  toler¬ 
ance  is  sansfied  it  is  decreased  by  0  02. 

A  new  notation  js  also  introduced  to  show  the  pattern  number  being  learned.  In 
the  algorithm  that  follows,  represents  the  p*  input  vector’s  i *  element.  The  nota¬ 
tion  outp/  represents  the  p*  output  vector’s  j*  element.  The  threshold  is  set  to  be  zero 
throughout  this  model.  The  weight  matrix  w  is  initialized  to  random  values  between  0 
and  0.3  to  prevent  any  oscillations  that  mignt  occur  from  a  weight  matrix  of  all  zeroes. 
An  epoch  is  a  complete  cycle  through  all  weight  adjustments  for  all  patterns  the  ANS 
is  being  presented.  The  learning  algorithm  is  as  follows: 

1.  Set  T  =  25,  ^  =  0.1,  tol  =  0.3,  epoch  =  0  and  9  =  0. 

2.  Initialize  the  weight  matrix  to  hold  random  values  between  0  and  0.3.  The  length 
of  the  input  vectors  is  N  and  the  length  of  the  output  vectors  is  V ,  so  the  weight 
matrix  is  an  Nx,W  matrix. 
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for  i  =  0  to  (N-l )  do 

for  j  =  0  to  (M-l)  do 

wi;  =  random  value  from  0  to  0.3 

enddo 

enddo 

3.  Get  the  input  and  output  vectors.  The  input  vector  is  of  length  N ,  the  output  vec¬ 
tor  is  of  length  M ,  and  there  are  P  of  these  associations  (patterns). 

/*  Get  input  vectors  */ 
for  p  -  0  to  (P-1)  do 

for  i  =  0  to  (N-l)  do 

in =  binary  value 

enddo 

enddo 

/*  Get  output  vectors  */ 
for  p  -  0  to  (P-1)  do 

for  ]  -  0  to  (M-l)  do 

inp/  =  binary  value 

enddo 

enddo 

4.  Now  that  the  weight  matrix  w  is  initialized  and  the  patterns  are  stored,  the  learn¬ 
ing  can  now  begin.  To  anneal  the  ANS,  two  flags  are  used.  The  first  is  the 
learn  Jlag  that  will  tell  when  the  whole  ANS  is  done  learning.  The  other  flag  is 
toLjlag  which  tells  when  the  ANS  has  learned  within  the  specified  tolerance  and 
is  ready  to  have  T,  u>l ,  and  r\  adjusted  for  the  next  step  in  annealing. 
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set  learn Jlag  -  TRUE 
while  learn  Jlag  *  TRUE  do 
set  tol Jlag  =  RAISE 
while  tol  Jlag  -  FALSE  do 
set  tol  Jlag  =  TRUE 
for  p  =  0  to  (P- 1 )  do 

for  ]  =  0  to  ( M  l )  do 

eruido 

™t,  '/<*«.* (net,) 
for  i  -  0  to  (N-l  J  do 

5  '  -  n  *  (out,,  -  netj  * 

if  lot  Jlag  =  TRUE 

and  \  ouipj  -  net,  I  >  tol  then 
tol  Jlag  =  FALSE 

endif 

enddo 

enddo 
endwhile 
T  =  T  -  1 

T1  =  T|  i-  0  1 

tol  =  tol  -  0.05 
if  tol  =  then 

learn  Jlag  =  TRUE 

endif 

endwhile 

This  fourth  step  of  the  learning  algorithm  continues  learning  each  partem  at  each 
tolerance  until  all  the  tolerances  are  satisfied.  This  ANS  follows  an  annealing  schedule 
that  starts  at  a  T  =  25,  p  =  0.1,  and  tol  =  0.3,  finishing  with  T  =  5,  n  =  0.5  and  tol  - 
0  1 

1.2.  The  Boltzmann  Machine  Recall  Algorithm 

The  recall  algorithm  is  much  simpler  than  the  learning  algorithm.  This  algorithm 
is  given  an  input  vector  test  and  computes  an  output  vector  recall  from  the  test  and  the 
connection  strengths  stored  in  w . 

1 .  Get  the  input  vector  test . 
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for  i  =  0  to  (N-I)  do 

test,  =  some  binary  value 

enddo 

2.  From  the  given  input  vector  calculate  an  output  vector  recall  by  doing  the  follow¬ 
ing: 


for  j  =  0  to  (M-l)  do 
net ,  =  test, 

I 

enddo 

recall ,  =fa*~k(net)) 


The  output  vector  recall  should  resemble  the  output  vector  originally  associated 
with  the  given  input  vector  test.  An  example  of  this  ANS  is  as  follows.  If  you  tram 
the  ANS  with  the  three  patterns 
10001->01110 
1 1 100— >001 1 1 
10101->01010 

the  ANS  takes  approximately  1700  epochs  to  satisfy  all  the  specified  tolerances.  The 
final  weight  matrix  W  is: 


-11.88  2.75  26.52  11.80  -2.87 

-3.42  -18.54  33.53  3.57  18.72 

-6.45  -4.23  -40.04  6.25  4.24 

0.30  0.10  0.30  0.10  0.20 

.-8.16  21.50  -6.50  8.23  -21.39 


This  is  the  same  model  used  by  Rumelhart  and  McClelland  in  their  study  in 
which  the  past  tense  of  English  verbs  was  learned.  Their  model  was  much  larger 
(more  input  and  output  PEs),  but  it  computes  the  same  way. 


